Omnidirectional audio-visual talker localizer with dynamic feature fusion based on validity and reliability criteria

نویسندگان

  • Yuki Denda
  • Takanobu Nishiura
  • Yoichi Yamashita
چکیده

Talker localization is indispensable in video conferencing. Statistical audio-visual (AV) talker localizers that fuse AV features based on prior statistical property are ideals. However, statistical property must be estimated prior to the AV feature fusion procedure. To overcome this problem, this paper proposes a novel robust and omnidirectional AV talker localizer that dynamically fuses AV features based on validity and reliability criteria for eliminating prior statistical property. Direction estimation of speech arriving using equilateral triangular microphone array and human position detection using an omnidirectional video camera extract AV features from captured AV signals. Validity criterion, called audioor visual-localization counter, validates both features. Reliability criterion, called evaluator of directional-speech arriving, acts as weight for dynamic AV feature fusion. The results of talker localization experiments in an actual office room confirmed that the proposed AV localizer based on dynamic feature fusion is superior to that of the conventional localizer that utilizes either audio or visual features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis

In this paper, we address the problem of automatic discovery of speech patterns using audio-visual information fusion. Unlike those previous studies based on single audio modality, our work not only uses the acoustic information, but also takes into account the visual features extracted from the mouth region. To improve the effectiveness of the use of multimodal information, several audio-visua...

متن کامل

Psychometric feature of the child and parent versions of psychological inflexibility in pain scale (PIPS) in children with chronic pain and their parents

Background: The aim of this study was to investigate the validity, reliability and factor structure of the child and parent's version of psychological inflexibility in pain scale (PIPS) in the population of children with chronic pain and their parents. Methods: The sample consisted of 112 pairs of children and parents, selected through available sampling method from the Tehran Children's Hospi...

متن کامل

Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition

During the fusion of audio and video information for speech recognition, the estimation of the reliability of the noise affected audio channel is crucial to get meaningful recognition results. In this paper we compare two types of reliability measures. One is the use of the statistics of the phoneme a-posteriori probabilities and the other is the analysis of the audio signal itself. We implemen...

متن کامل

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly ...

متن کامل

Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons

Jointly using audio and video features can increase the robustness of automatic speech recognition systems in noisy environments. A systematic and reliable performance gain, however, is only achieved if the contributions of the audio and video stream to the decoding decision are dynamically optimized, for example via so-called stream weights. In this paper, we address the problem of dynamic str...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007